-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Write pickle to file-like without intermediate in-memory buffer #37056
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Before this change, calling pickle.dumps() created an in-memory byte buffer, negating the advantage of zero-copy pickle protocol 5. After this change, pickle.dump writes directly to open file(-like), cutting peak memory in half in most cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, can you add a memory asv and show the results of this & a whatsnew note for 1.2 perf section.
also to actually close this issue i think would be good to add some tests that explicity set the pickle protocal to 5 |
Isn't that covered by py38 tests where pickle.HIGHEST_PROTOCOL (==5) is used by default? |
probably but let's make it explict as well (e.g. parameterize on 4,5,highest when PY38 or greater) |
Added ASV peakmem_ benchmark. The benchmark currently uses an ~11MB dataframe - I have no experience with ASV to tell how reliable memory benchmarking is for small-ish objects, any advice appreciated. |
it should show a diff here in any event. what kind of results are you seeing? |
Yep, added results above - |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm ping on green. can you post the results of the asv (in the top of the PR, just edit it)
lgtm @ig248 just resolved the conflict and can merge on green. |
thanks @ig248 very nice! |
Before this change, calling
pickle.dumps()
created an in-memory byte buffer, negating the advantageof zero-copy pickle protocol 5. After this change,
pickle.dump
writes directly to open file(-like),cutting peak memory in half in most cases.
Profiling was done with pandas@master and python 3.8.5
Related issues:
Update: ASV results
$ asv continuous -f 1.1 origin/master HEAD -b pickle
yields:black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff